智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

BrainCog: A Spiking Neural Network based Brain-inspired Cognitive Intelligence Engine for Brain-inspired AI and Brain Simulation

Yi Zeng , Dongcheng Zhao , Feifei Zhao , Guobin Shen , Yiting Dong , Enmeng Lu , Qian Zhang , Yinqian Sun , Qian Liang , Yuxuan Zhao

分类：神经与进化计算

2022-07-18

尖峰神经网络（SNN）引起了脑启发的人工智能和计算神经科学的广泛关注。它们可用于在多个尺度上模拟大脑中的生物信息处理。更重要的是，SNN是适当的抽象水平，可以将大脑和认知的灵感带入人工智能。在本文中，我们介绍了脑启发的认知智力引擎（Braincog），用于创建脑启发的AI和脑模拟模型。 Braincog将不同类型的尖峰神经元模型，学习规则，大脑区域等作为平台提供的重要模块。基于这些易于使用的模块，BrainCog支持各种受脑启发的认知功能，包括感知和学习，决策，知识表示和推理，运动控制和社会认知。这些受脑启发的AI模型已在各种受监督，无监督和强化学习任务上有效验证，并且可以用来使AI模型具有多种受脑启发的认知功能。为了进行大脑模拟，Braincog实现了决策，工作记忆，神经回路的结构模拟以及小鼠大脑，猕猴大脑和人脑的整个大脑结构模拟的功能模拟。一个名为BORN的AI引擎是基于Braincog开发的，它演示了如何将Braincog的组件集成并用于构建AI模型和应用。为了使科学追求解码生物智能的性质并创建AI，Braincog旨在提供必要且易于使用的构件，并提供基础设施支持，以开发基于脑部的尖峰神经网络AI，并模拟认知大脑在多个尺度上。可以在https://github.com/braincog-x上找到Braincog的在线存储库。

translated by 谷歌翻译

Contextual embedding and model weighting by fusing domain knowledge on Biomedical Question Answering

Yuxuan Lu , Jingya Yan , Zhixuan Qi , Zhongzheng Ge , Yongping Du

分类：自然语言处理 | 人工智能

2022-06-26

生物医学问题的回答旨在从生物医学领域获得对给定问题的答案。由于其对生物医学领域知识的需求很高，因此模型很难从有限的培训数据中学习域知识。我们提出了一种上下文嵌入方法，该方法结合了在生物医学域数据上预先训练的开放域QA模型\ AOA和\ biobert模型。我们对大型生物医学语料库采用无监督的预培训，并在生物医学问题答案数据集上进行了微调。此外，我们采用基于MLP的模型加权层自动利用两个模型的优势以提供正确的答案。由PubMed语料库构建的公共数据集\ BIOMRC用于评估我们的方法。实验结果表明，我们的模型以大幅度优于最先进的系统。

translated by 谷歌翻译

RNGDet: Road Network Graph Detection by Transformer in Aerial Images

Zhenhua Xu , Yuxuan Liu , Lu Gan , Yuxiang Sun , Xinyu Wu , Ming Liu , Lujia Wang

分类：计算机视觉

2022-02-16

道路网络图为自动驾驶应用程序提供关键信息，例如可用于运动计划算法的可驱动区域。为了找到道路网络图，手动注释通常效率低下且劳动密集型。自动检测道路网络图可以减轻此问题，但现有作品仍然存在一些局限性。例如，基于细分的方法无法确保令人满意的拓扑正确性，并且基于图的方法无法提供足够精确的检测结果。为了解决这些问题的解决方案，我们在本文中提出了一种基于变压器和模仿学习的新方法。鉴于当今世界各地可以轻松访问高分辨率航空图像，我们在方法中使用航空图像。作为输入的空中图像，我们的方法迭代生成道路网络图逐vertex。我们的方法可以处理复杂的交叉点，以及各种事件的道路细分。我们在公开可用的数据集上评估我们的方法。通过比较实验证明了我们方法的优势。我们的作品附有一个演示视频，可在\ url {https://tonyxuqaq.github.io/projects/rngdet/}中获得。

translated by 谷歌翻译

Learning to Share in Multi-Agent Reinforcement Learning

Yuxuan Yi , Ge Li , Yaowei Wang , Zongqing Lu

分类：机器学习

2021-12-16

在本文中，我们研究了网络多功能增强学习（MARL）的问题，其中许多代理被部署为部分连接的网络，并且每个代理只与附近的代理交互。网络Marl要求所有代理商以分散的方式作出决定，以优化具有网络之间邻居之间的限制通信的全局目标。受到事实的启发，即\ yexit {分享}在人类合作中发挥关键作用，我们提出了一个分层分散的MARL框架，使代理商能够学会与邻居动态共享奖励，以便鼓励代理商在全球合作客观的。对于每个代理，高级策略了解如何与邻居分析奖励以分解全局目标，而低级策略则会学会优化由邻域的高级策略引起的本地目标。两项政策形成双级优化，交替学习。我们经验证明LTOS在社交困境和网络MARL情景中表明现有的现有方法。

translated by 谷歌翻译

csBoundary: City-scale Road-boundary Detection in Aerial Images for High-definition Maps

Zhenhua Xu , Yuxuan Liu , Lu Gan , Xiangcheng Hu , Yuxiang Sun , Lujia Wang , Ming Liu

分类：计算机视觉 | 机器人

2021-11-11

高清（HD）地图可以为自动驾驶提供静态交通环境的精确几何和语义信息。道路边界是高清地图中包含的最重要的信息之一，因为它区分道路地区和越野地区，可以引导车辆在道路区域内驾驶。但它是劳动密集型的，以向城市规模提供高清地图的道路边界。为了启用自动高清映射注释，当前工作使用语义分割或迭代图，用于道路边界检测。然而，前者无法确保拓扑正确性，因为它在像素级别工作，而后者遭受效率低下和漂流问题。为了提供上述问题的解决方案，在这封信中，我们提出了一个新的系统被称为CSBoundary，以便在城市规模上自动检测高清地图注释的道路边界。我们的网络将作为输入空中图像补丁的输入，并直接从此图像中递送连续的道路边界图（即顶点和边缘）。要生成城市规模的道路边界图，我们将从所有图像修补程序缝制所获得的图形。我们的CSBoundary在公共基准数据集中进行了评估并进行了比较。结果表明了我们的优越感。伴随的演示视频可在我们的项目页面\ url {https:/sites.google.com/view/csbound/}处获得。

translated by 谷歌翻译

Unified Group Fairness on Federated Learning

Fengda Zhang , Kun Kuang , Yuxuan Liu , Chao Wu , Fei Wu , Jiaxun Lu , Yunfeng Shao , Jun Xiao

分类：机器学习

2021-11-09

联合学习（FL）已成为一个重要的机器学习范例，其中全局模型根据分布式客户端的私有数据培训。然而，由于分布转移，现有的大多数流体算法不能保证对不同客户或不同的样本组的性能公平。最近的研究侧重于在客户之间实现公平性，但它们忽视了敏感属性（例如，性别和/或种族）形成的不同群体的公平，这在实际应用中是重要和实用的。为了弥合这一差距，我们制定统一小组公平的目标，该目标是在不同群体中学习具有类似表现的公平全球模式。为了实现任意敏感属性的统一组公平，我们提出了一种新颖的FL算法，命名为集团分布强制性联邦平均（G-DRFA），其跨组减轻了与收敛速度的理论分析的分布转移。具体而言，我们将联邦全球模型的性能视为目标，并采用分布稳健的技术，以最大化最坏性地组的性能在组重新传递集团的不确定性上。我们在实验中验证了G-DRFA算法的优点，结果表明，G-DRFA算法优于统一组公平现有的公平联合学习算法。

translated by 谷歌翻译

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Xingyu Chen , Zihan Zhao , Lu Chen , Danyang Zhang , Jiabao Ji , Ao Luo , Yuxuan Xiong , Kai Yu

分类：自然语言处理

2021-01-23

Web搜索是人类获取信息的重要方法，但是对于了解网页内容的机器仍然是一个巨大的挑战。在本文中，我们介绍了对网上结构阅读理解（SRC）的任务。鉴于网页和关于它的问题，任务是从网页找到答案。此任务要求系统不仅要了解文本的语义，还需要了解文本的语义，还需要网页的结构。此外，我们提出了一种新的基于Web的结构阅读理解数据集。 WebSRC由400K问答对组成，从6.4K网页收集。与QA对一起，我们的数据集还提供了相应的HTML源代码，屏幕截图和元数据。 WebSRC中的每个问题都需要对网页的某种结构理解来回答，并且答案是网页或是/否的文本跨度。我们评估我们数据集的各种基线，以显示我们的任务难度。我们还研究了结构信息和视觉功能的有用性。我们的数据集和基线已在HTTPS://x-lance.github.io/websrc/上公开提供。

translated by 谷歌翻译

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Yukun Feng , Ming Tu , Rui Xia , Chuanzeng Huang , Yuxuan Wang

分类：自然语言处理

2022-12-30

Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.

translated by 谷歌翻译

Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography

Ilya Chugunov , Yuxuan Zhang , Felix Heide

分类：计算机视觉

2022-12-22

Modern mobile burst photography pipelines capture and merge a short sequence of frames to recover an enhanced image, but often disregard the 3D nature of the scene they capture, treating pixel motion between images as a 2D aggregation problem. We show that in a "long-burst", forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth. To this end, we devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion. Our plane plus depth model is trained end-to-end, and performs coarse-to-fine refinement by controlling which multi-resolution volume features the network has access to at what time during training. We validate the method experimentally, and demonstrate geometrically accurate depth reconstructions with no additional hardware or separate data pre-processing and pose-estimation steps.

translated by 谷歌翻译